Multilevel proteomic analyses reveal molecular diversity between diffuse-type and intestinal-type gastric cancer

Diffuse-type gastric cancer (DGC) and intestinal-type gastric cancer (IGC) are the major histological types of gastric cancer (GC). The molecular mechanism underlying DGC and IGC differences are poorly understood. In this research, we carry out multilevel proteomic analyses, including proteome, phospho-proteome, and transcription factor (TF) activity profiles, of 196 cases covering DGC and IGC in Chinese patients. Integrative proteogenomic analysis reveals ARIDIA mutation associated with opposite prognostic effects between DGC and IGC, via diverse influences on their corresponding proteomes. Systematical comparison and consensus clustering analysis identify three subtypes of DGC and IGC, respectively, based on distinct patterns of the cell cycle, extracellular matrix organization, and immune response-related proteins expression. TF activity-based subtypes demonstrate that the disease progressions of DGC and IGC were regulated by SWI/SNF and NFKB complexes. Furthermore, inferred immune cell infiltration and immune clustering show Th1/Th2 ratio is an indicator for immunotherapeutic effectiveness, which is validated in an independent GC anti-PD1 therapeutic patient group. Our multilevel proteomic analyses enable a more comprehensive understanding of GC and can further advance the precision medicine.

The exact sample size (n) for each experimental group/condition, given as as a discrete number and unit of of measurement A statement on on whether measurements were taken from distinct samples or or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of of all covariates tested A description of of any assumptions or or corrections, such as as tests of of normality and adjustment for multiple comparisons A full description of of the statistical parameters including central tendency (e.g. means) or or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or or associated estimates of of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of of freedom and P value noted Give P values as exact values whenever suitable.
For Bayesian analysis, information on on the choice of of priors and Markov chain Monte Carlo settings For hierarchical and complex designs, identification of of the appropriate level for tests and full reporting of of outcomes Estimates of of effect sizes (e.g. Cohen's d, Pearson's r), ), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or or software that are central to to the research but not yet described in in published literature, software must be be made available to to editors and reviewers. We We strongly encourage code deposition in in a community repository (e.g. GitHub). See the Nature Portfolio guidelines for submitting code & software for further information.

Chen Ding
Nov 20, 2022 MS MS raw files were processed with the Firmiana(version 1.0) proteomics workstation. Raw files were searched against the NCBI human Refseq protein database (released on on 04-07-2013, 32,015 entries) in in Mascot search engine (version 2.3, Matrix Science Inc) Statistical analyses, included Chi-square test, Fisher's exact test, Wilcoxon rank-sum test, Wilcoxon signed-rank test, One-way anova, Hierarchical clustering analysis, Benjamini-Hochberg (BH) correction, and Principal component analysis (PCA), were realized by by R (v4.0.4). Analysis on on dominant biological processes was performed with the Gene Set Enrichment Analysis (GSEA) software (v4.1.0) or or R package clusterProfiler (v3.18.1). Survival analysis was performed with the R package Survival (v3.2-11) or or GraphPad 6.0. Consensus clustering was performed using the R package ConsensusClusterPlus (v1.54.0). Cell cycle phase analysis was performed using the R package Seurat (v4.0.1).For the optimal cutoff point in in the K-M analysis of of certain protein, we we used "survminer" package (v0.4.9). The changes in in a kinase's activity were estimated by by Kinase-Substrate Enrichment Analysis (KSEA) app(v1.0).

March 2021
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A description of any restrictions on data availability -For clinical datasets or third party data, please ensure that the statement adheres to our policy

Human research participants
Policy information about studies involving human research participants and Sex and Gender in Research.

Reporting on sex and gender
Population characteristics

Recruitment
Ethics oversight Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative. No statistical method was used to pre-determine the sample size. The sample size was based on published papers in the proteomic field (PMID: 29520031; 30814741; 32649877). Saturation curve (Supplemental Figure 1 F) and the statistical differences in the subtypes (Figure 4) provide the rationale for sufficiency of the sample sizes.
For samples data, the distribution of median values was used to discriminate the samples with insufficient protein or phospho-site detected. The samples with median values which were larger than upper quartile + 1.5 IQR (interquartile range) would be excluded from further analyses. To evaluate the comparability of data, we compared the data distribution with boxplots and density curves. Samples with a clear bimodal distribution of protein quantification would be excluded from further analyses. Furthermore, QC results required both of tumor tissues and paired NATs passed QC procedures. spectrometer platforms. The high average spearman's correlation coefficient showed the stability of of our MS MS platforms. Besides, the proteomic subtypes of of DGC and IGC were identified by by consensus clustering in bioinformatic analysis, which repeated clustering 1,000 times. These provided the stable classification. For intergroup comparison, there were repeats in in each group, which avoided the heterogeneity among patients. The patient number in in each group was indicated in in Results and figure legends.
We We performed consensus clustering for the proteomic subtyping based on on R package ConsensusClusterPlus (v1.54.0). Samples were clustered with 1,000 resampling repetitions in in the range of of 2 to to 6 clusters. The consensus CDF and delta plots provided the clearest separation among the clusters. Then, samples were allocated into different groups. All the survival analyses of of proteomic subtyping were adjusted by by other clinical covariates including gender, age, TNM stage and chemotherapy.
The investigators who performed sample processing and measured protein expression by by mass spectrometry were blinded to to patient information.
Cells validation using short tandem repeat markers (STR) were performed by by Meixuan Biological Science and Technology Ltd. (Shanghai). In detail, these cell lines were firstly tested cell species by by PCR method using extracted total genomic DNA, and examined by by STR profiling. Then, STR data were analyzed using the DSMZ (German Collection of of Microorganisms and Cell Cultures) online STR database (http://www.dsmz.de/fp/cgi-bin/str.html).
Cell line was tested negative for mycoplasma contamination.
No No commonly misidentified cell lines were used.